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Introduction.  In  the  five  years  covered  by  this  grant,  January  1977  through 
December  1981,  a  total  of  twenty  research  articles  were  written  for  publication  in 
technical  journals,  as  well  as  two  books  and  a  Ph.D.  thesis.  Although  these  are 
addressed  to  a  wide  variety  of  optimization  problems,  they  have  a  common  theme: 
the  characterization  and  computation  of  solutions  by  methods  based  on  subgradient 
analysis  and  duality.  Fundamental  advances  in  theory  are  embodied  in  this  work. 

The  following  topics  will  be  discussed  individually  below: 

A.  Multiplier  Algorithms  in  Nonlinear  Programming 

B.  Multistage  Stochastic  Optimization 

C.  Networks  and  Monotropic  Programming  Methods 

D.  Generalized  Subgradients  and  Nonsmooth  Optimization 

E.  Marginal  Values  and  sensitivity  Analysis  f  0r'c\ 

I  Copy 

F.  Genericity  of  Optimality  Conditions  y**^***©^ 

G.  Optimal  Control  of  Dynamical  Systems 
References  [1],  [2],  etc.,  are  to  work  performed  under  this  grant,  while  references 
[a],  [b],  etc.,  are  to  other  publications;  all  are  listed  at  the  end. 

A.  Multiplier  Algorithms  In  Nonlinear  Programming* 

A  general  nonlinear  programming  problem  in  finitely  many  variables  has  the 

form 

Approved  for  public  release 
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fQ(x)  over 

all 

xQC  satisfying 

s  0 

for 

i*  1 ,  •  •  • ,  s , 

fH  (x) 

*  0 

for 

i  *  s+ 1,.  •  • ,  m , 

-*•  R,  1-0,1.. 

. « ,  m  a 

A  large  number 

proposed  over  the  years  for  solving  such  problems,  but  among  the  most  popular  and 
effective  nowadays  are  the  so-called  multiplier  methods,  initiated  independently 
by  M.R.  Hestenes  [a]  and  M.J.D.  Powell  [b]  in  1969  and  developed  extensively  in  the 
ml d-70  *s ,  especially  by  D.  Bertsekas  [c]  and  R.T.  Rockafellar  [d],  [e).  The  virtue 
of  these  methods  is  that  they  avoid  the  difficulties  of  dealing  directly  with 
nonlinear  constraints  by  replacing  (1)  by  a  certain  sequence  of  unconstrained  mini¬ 
mization  problems.  In  this  there  is  a  similarity  with  penalty  methods,  and  indeed, 
multiplier  methods  have  largely  supplanted  the  latter  because  they  exhibit  the 
same  virtues  along  with  better  convergence  rates  and  greater  numerical  stability. 

Multiplier  methods  involve  Lagrange  multipliers  in  addition  to  a  penalty 
parameter.  They  are  based  on  the  study  of  the  augmented  Lagrangian  function  for 
problem  (1),  namely 

L(x,y,r)  «  fQ(x)  +  £  (y^Cx)  +  yrf^x)2} 

i*s+l 


yifi^x*  +  2rfi<x)2  if  fjOO  2  -yi/r 
1=1  “  2r  yi  if  fi(x)  *  -yi/r 


for  X0C,  y€R  ,  r  >  0. 

This  contrasts  with  the  ordinary  Lagrangian  function 


f.(x,y)  -  fn(x)  +  £  y,f,(x) 
u  i-1  1  1 

for  xQi,  ySR8  x  R®"8 
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in  furnishing  saddle  poinC  representations  of  optimal  solutions  to  (1)  (augmented 
duality)  even  in  nonconvex  programming.  (This  was  established  in  [d]  by  Rocka- 
fellar,  who  also  was  responsible  for  showing  that  formula  (2)  was  the  right  way 
to  incorporate  inequality  constraints  into  the  augmented  Lagrangian.) 

The  basic  form  of  the  Hestenes-Powell  multiplier  method  begins  with  a  choice 
of  x^,y^  and  r^  and  in  the  general  step  takes 

k+1  ,  T  (  k  x 

x  argmin  L(x,y  ,r.)  , 

x£X  K 

k+l  _  .  ,  k+1  k  . 
y  «VyL(x  ,y  ,rk),  Vl  S  V 


k+1 

Here  the  notation  means  that  x  is  an  approximate  minimizer  of  the  function 
L(*,y  ,r^)  on  X.  The  set  X  is  supposed  to  have  a  simple  fora  (e.g.  a 

k 

generalized  rectangle),  so  this  minimization,  which  uses  x  as  the  starting 
point,  can  be  effected  by  means  of  the  highly  efficient  algorithms  now  known 
for  (essentially)  unconstrained  optimization.  The  main  questions  concern  the 
stopping  rule  that  should  be  used  in  the  approximate  minimization,  the  strategy 
in  updating  the  penalty  parameter,  and  the  kinds  of  convergence  that  can  be 
obtained.  Generally  speaking,  it  is  possible  to  obtain  global  convergence  at  an 
arbitrarily  good  linear  rate,  without  having  r^  +  ».  For  this,  however,  one  must 
use  a  stopping  criterion  of  the  form 


(5)  Fk(xk+1)  ‘  inf  Fk(x)  S  ek‘  where  Fk(x)  *  L(x,yk,rk), 

xEX 


K 

which  requires  good  estimates  of  the  greatest  lower  bound  for  L(*,y  ,r^)  on  X, 


something  not  always  available. 


AIR  j 
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The  two  articles  [3]  and  [11]  on  this  subject  that  were  produced  under  this 


grant  propose  a  new  version  of  the  multiplier  method  that  gets  around  these 
difficulties  with  the  stopping  criterion  and,  as  a  byproduct,  makes  it  possible 
to  solve  an  important  class  of  "extended**  convex  programming  problems,  called 
variational  inequalities.  The  rule  in  the  new  version  involves  another  parameter 


0: 


sr  argmin  {L(x,yk,r,  )  +  (l/2s,)||x-xk|| *}, 


k+1  n  T  ,  k+1  k  x  ^  ^ 

y  =  V  L(x  »y  ,rk)  rR+1  *  rR,  sR+1  2  sR. 


This  is  just  as  easy  to  execute  and  leads  to  the  same  nice  convergence  properties. 
Its  big  advantage  is  that  it  is  amenable  to  a  stopping  criterion  of  a  much  more 
convenient  type: 

(7)  ||  pro j  VFk(xk+1)||  S  eR, 

where  FR(x)  *  L(x,yk,rk>  +  (1/2  sk)||  x-xk||  2  , 

k+1 

the  projection  being  that  of  the  gradient  VFR(x  )  on  the  tangent  cone  to  X 
k+1 

at  x  (which  for  the  usual  sets  X  is  simple  to  compute). 

A  very  interesting  feature  of  the  modified  rule  is  that  everything  can  be 

f 

carried  out  in  terms  of  the  mappings  VF^  alone.  The  function  values  F^(x) 
don't  need  to  play  any  role.  This  being  the  case,  it  is  possible  to  replace 
the  gradient  Vf^  in  problem  (1)  by  a  much  more  general  kind  of  mapping 
A:Rn  +  Rn.  The  sequence  (xk,yk)  generated  by  the  algorithm  then  converges 
(under  mild  assumptions)  to  a  solution  (x,y)  to  the  so-called  variational 
inequality  obtained  when  A(x)  is  substituted  for  VfQ<x)  in  writing  down  the 
Kuhn-Tucker  conditions  for  optimality  in  (1). 
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All  this  works  in  particular  when  A  is  a  monotone  mapping  in  the  sense 

that 

[A(x r )  -  A(x)  ] •  (x*  -  x]  2  0  for  all  x*,x€Rn. 

Indeed,  much  of  the  theory  of  multiplier  methods  rests  on  the  study  of  such 
mappings,  so  this  is  a  very  natural  extension.  It  provides  a  new  computational 
handle  on  many  problems  in  partial  differential  equations  that  can  be  represented 
as  variational  inequalities. 

Much  remains  to  be  done  in  connection  with  these  problems  and  their  special 
structures.  The  relationship  between  the  parameters  r^  and  s^,  and  the 
strategies  for  updating  them,  would  benefit  from  further  study  too. 

B.  Multistage  Stochastic  Optimization.  A  common  but  difficult  situation  to  deal 
with  in  applications  of  optimization  is  the  kind  where  decisions  must  be  made 
here  and  now,  but  the  outcomes  of  these  decisions  will  be  strongly  affected  by 
future  events  about  which  there  is  only  statistical  information.  Usually, 
recourses  are  available  in  the  future  in  order  to  correct  the  effects  of  the 
here-and-now  decisions,  after  the  true  situation  becomes  better  known.  But  the 
cost  and  scope  of  the  recourses  may  depend  too  on  what  has  to  be  decided  in 
advance.  Multistage  stochastic  optimization  problems,  also  called  stochastic 
programming  problems  or  recourse  problems,  are  an  attempt  to  model  this  state 
of  affairs. 

To  keep  things  simple,  let  us  imagine  a  situation  where  at  times  t*l,2,...,N 

nt 

a  vector  xfc  must  be  chosen  from  a  space  R  in  response  to  an  observation 
w£  (which  is  a  random  vector  variable  with  known  distribution,  at  least  in  the 
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most  elementary  versions  of  the  model).  Present  decisions  cannot  depend  on 
future  observations ,  so  a  decision  policy  must  be  a  function  of  the  special  form 

x (w)  *  (x1(w1),x2(w1,w2),  ....  xN(w1,w2>...twN)). 

Such  a  function  x  is  said  to  be  nonant icipative.  The  problem  is  to  minimize, 
over  all  nonanticipative  functions  x,  an  expected  cost 

Ew{f0(w»*(w))} 

subject  to  constraints  of  the  form 

(8)  f ^ (w , x (w) )  £  0  almost  surely,  i-l,...,m. 

The  theory  of  such  problems  was  developed  by  Rockafellar  and  Wets  in  the 
two-stage  case  (N~2)  in  a  series  of  papers  [f],  [g],  [h],  [ i ] .  The  foundation 
for  the  N-stage  case  was  laid  in  [j]. 

Article  [1],  written  under  the  present  grant,  derives  for  the  first  time  the 
existence  of  Lagrange  multipliers  y^(w)  for  the  constraints  (8)  as  a  charac¬ 
terization  of  the  optimal  decision  function  x.  It  develops  special  properties 
in  the  case  of  separable  constraint  functions 

fi(w,x(w))  -  fil(w1,x^1(w1))  +  fi2(w2,x2(w1,w2>)  +  ... 

and  explores  certain  connections  with  stochastic  optimal  control.  Convexity  is 
assumed  throughout. 

The  results  are  very  complete  and  satisfying  as  regards  optimality  conditions 
and  their  interpretation,  and  they  can  fairly  be  viewed  as  a  landmark  in  stochastic 
optimization  on  such  terms.  Nevertheless,  they  are  only  theoretical  results. 

They  are  an  important  step  towards  computation,  but  much  work  on  actual  algorithms 
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,r 


will  be  needed  before  problems  of  this  highly  important  kind  can  be  solved 
practically  and  efficiently.  Large-scale  decomposition  techniques  in  terms  of 
the  Lagrangian  price  vectors  y(w)  will  be  required.  The  theory  of  nonconvex 
problems  will  eventually  need  to  be  developed  too. 

C.  Networks  and  Monotropic  Programming. 

A  monotropic  programming  problem  is  an  optimization  problem  in  which  a  con¬ 
vex  function  having  a  representation  of  the  type 

(9)  F(ul . “■>  "  £j-l  fj(ajlul+aj2u2+--*+ajmVbj) 

is  minimized  subject  to  linear  equality  and  inequality  constraints  on  the  variables 

u.,...,u  .  Linear  and  quadratic  programming  problems  are  a  special  case*  as  are 
1  m 

separable  convex  programming  problems.  Indeed,  any  monotropic  programming  problem 
can  be  reduced  to  the  canonical  form 

(10)  minimize  f^  (x^)+. .  .-^f^Cx^) 

over  all  x  -  (x^,...,xN)6KcRN  satisfying 

XjGCj  for  j*l,...,N, 

N 

where  K  is  a  linear  subspace  of  R  (described  by  a  system  of  homogeneous 
linear  equations),  each  is  a  real  interval,  and  f^  is  a  closed  proper 

convex  function  of  a  single  real  variable,  having  as  its  effective 

domain.  Associated  with  this  is  a  canonical  dual  problem  of  the  same  sort: 

(11)  maximize  -gj  (yj)-.  • 

N 

overall  y€(y^, . . . R  satisfying 
y^€Dj  for  J-1,...,N, 


V 
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where  L  is  the  linear  subspace  orthogonal  to  K  (expressible  by  an  adjoint 
system  of  equations),  and  gj  is  the  convex  function  on  R  conjugate  to  f  ; 
the  interval  is  the  effective  domain  of  g  ^ . 

This  kind  of  duality,  which  can  be  utilized  almost  as  fully  and  explicitly 
as  linear  programming  duality  (which,  by  the  way,  it  subsumes)  is  a  characteris¬ 
tic  feature  of  monotropic  programming.  It  makes  possible  a  whole  range  of  methods 
and  approaches  that  otherwise  would  not  be  available.  This  is  why  such  problems 
need  to  be  recognized  and  treated  as  a  class  apart. 

The  main  theoretical  guideline  for  general  monotropic  programming  comes 
from  network  programming,  namely  the  case  where  represents  the  flow  in  the 

jth  arc  of  a  certain  directed  graph  and  y^  is  the  "tension"  across  the  arc.  In 
that  setting,  K  is  the  space  of  circulations  (flows  conserved  at  every  node), 
and  L  is  the  space  of  tensions  representable  as  potential  differences  (relative 
to  some  potential  function  defined  on  the  set  of  nodes  of  the  graph).  An  enor¬ 
mous  number  of  practical  problems  in  operations  research,  including  logistics, 


warehousing,  project  scheduling  and  the  analysis  of  pipe  systems,  fall  into  this 


category. 

Article  [13]  introduces  basic  descent  methods  for  monotropic  programming 
problems.  It  demonstrates  that  any  such  method,  applied  to  either  problem  (10) 
or  (the  negative  of)  problem  (11),  as  is  always  possible  due  to  total  symmetry, 
will  inevitably  solve  both  (10)  and  (11).  This  computational  circumstance  leads 
to  a  new  theoretical  result:  a  constructive  proof  of  the  duality  theorem  for 
roonotropic  programming,  i.e.  the  fact  that  the  optimal  values  in  (10)  and  (11) 
must  be  equal  unless  both  problems  fail  to  be  feasible.  This  theorem  is  a  power¬ 
ful  tool  in  the  design  and  interpretation  of  algorithms.  It  holds  a  unique  position 
in  the  duality  literature  in  not  requiring  either  the  linearity  of  objectives  or 


any  kind  of  strict  feasibility. 
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Algorithms  in  monotropic  programming  have  a  distinctly  combinatorial  nature: 
descent  is  in  special  directions  induced  by  a  matroidal  substructure  associated 
with  the  linear  subspaces  K  and  L.  This  subject  has  until  now  not  been 
investigated  or  even  recognized  as  a  unified  whole  (although  examples  in  linear 
and  network  programming  are  well  known),  and  herein  lies  the  novelty  and  signi¬ 
ficance  of  the  monograph  [16] •  There  is  too  much  in  this  work  to  be  described 
here.  For  a  better  idea  of  the  contribution,  the  preface  to  [16],  the  table  of 
contents  and  the  section  of  comments  at  the  end  of  each  chapter  may  be  consulted. 
Many  new  computational  methods  and  conceptual  innovations  are  provided.  The  book 
includes  the  first  comprehensive  treatment  of  nonlinear  network  flow  problems 
and  separable  convex  programming. 

D.  Subgradient  Analysis  and  Nonsmooth  Optimization. 

This  is  another  big  subject  on  which  far  too  much  has  been  accomplished  in 
the  five  years  under  the  present  grant  for  there  to  be  any  hope  of  giving  more 
than  a  brief  indication  here.  Motivation  starts  with  the  fact  that  optimization 
problems  very  frequently  Involve  functions  that  are  not  differentiable,  at  least 
not  everywhere. 

In  direct  terms,  one  can  run  into  cost  functions  that  are  merely  piecewise 
smooth  (the  derivatives  jump  at  certain  breakpoints),  as  well  as  "max  functions" 
of  the  form 

(12)  h(x)  *  max  h  (x)  (I  •  some  index  set)  , 

i£I 

whose  graphs  exhibit  "corner  points"  of  a  rather  complicated  sort.  Convex  func¬ 
tions  on  Rn  are  always  representable  as  max  functions  (12)  with  each  h^  affine 
(i.e.  linear-plus-a-constant) ,  and  as  this  suggests,  they  are  not  necessarily 
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differentiable.  Thus  economic  models  in  which  convexity  is  postulated  *  but 
differentiability  is  less  natural  as  a  fundamental  assumption,  fall  under  the 
heading  of  "nonsraooth  optimization". 

Nondif ferentiable  functions  also  arise  indirectly.  In  linear  programming, 
for  instance,  the  optimal  value 

(13)  cp (b )  *  inf{cx|x  ^  0,  Ax  £  b} 

is  only  piecewise  linear  with  respect  to  the  vector  b.  The  role  of  the  optimal 
solutions  to  the  dual  problem,  as  vectors  of  "shadow  prices"  associated  with  the 
resources  in  the  primal  problem,  cannot  be  understood  without  reference  to  this 
potential  lack  of  differentiability  of  (£>.  More  generally,  the  quantity 

(14)  p (v )  *  inf {f n(v,x) | f - (v,x)  ^  0,...,f  (v,x)  £  0}, 

u  i  m 

giving  the  optimal  value  in  a  constrained  minimization  problem  in  x  which  depends 

on  a  parameter  vector  v,  is  generally  not  differentiable  with  respect  to  v, 

even  if  the  functions  f^,f, ,...,f  themselves  are  infinitely  smooth. 

(ri  m 

Exact  penalty  methods  for  solving  nonlinear  programming  problems,  as  well 
as  decomposition  techniques  and  duality-based  computational  schemes  of  the  sort 
that  is  now  very  popular  in  branch-and-bound  approaches  to  combinatorial  problems, 
typically  lead  to  the  consideration  of  auxilliary  functions  that  are  not  smooth. 
Sometimes  these  functions  take  on  quite  a  complicated  form,  as  in  the  case  of 
problems  of  engineering  design  where  specifications  can  be  met  within  certain 
tolerances  by  a  "tuning"  process  after  basic  manufacture;  see  E.  Polak  (k ]  (also 


introduction  to  [12]). 
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Nondif ferentiable  convex  functions  have  been  treated  successfully  for  many 
years.  Many  of  the  techniques  were  developed  by  R.T.  Rockafellar  and  presented 
in  his  book  Convex  Analysis  [8.].  The  big  breakthrough  for  nonconvex  functions, 
however,  came  with  the  thesis  work  of  F.H.  Clarke  under  a  predecessor  of  the 
present  grant.  Clarke  was  able  to  define  subgradients  of  arbitrary  lower  semi- 
continuous  functions  on  Rn  is  a  manner  totally  in  harmony  with  the  convex  case 
and  the  classical  analysis  of  smooth  functions;  see  [m] .  Clarke’s  approach  was 
somewhat  roundabout,  though,  and  his  definitions  seemed  to  depend  unduly  on  the 
Euclidean  norm  in  Rn,  which  tended  to  hamper  applications,  not  to  mention 
extensions  to  problems  in  infinite-dimensional  spaces. 

One  of  the  main  accomplishments  under  the  present  grant  has  been  the  further 
development  and  strengthening  of  the  theory  of  generalized  subgradients  of  non¬ 
convex  functions,  especially  with  an  eye  towards  certain  applications  that  will 
be  discussed  in  the  next  section.  Deep,  fundamental  results  were  obtained  in 
[6],  [7],  [8]  and  [18].  These  are  long  papers,  and  as  mentioned  above,  it  is 
impossible  to  go  into  the  details  here.  Fortunately  that  isn’t  necessary,  since 
the  recently  published  monograph  [12],  also  written  under  this  grant,  provides  a 
readable  survey,  in  fact  the  very  first  to  become  available  on  this  burgeoning 
subject . 


E.  Marginal  Values  and  Sensitivity  in  Nonlinear  Programming. 

The  generalized  subgradient  analysis  described  above  is  ideally  suited  to 
elucidating  the  properties  of  the  sort  of  nonsmooth  function  appearing  in 
formulas  (13)  and  (14).  Let  us  imagine  more  generally  a  problem  of  the  form 


(P  ) 


u,v 


minimize  f^(v,x)  over  all  x€D(v) 


satisfying 


ft(v,x)  +  u± 


S  0 

-  0 


for 

for 


i*l , *  •  • ,  s 
i«s+l, . . . 


,m, 
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where  v€R^  and  u*(u^, . . .  f u^JER01.  Denote  the  optimal  value  in  (P^  by 

p(u,v);  then  p  is  an  extended-real-valued  function  on  Rm  *  R**,  and  under 

mild  assumptions  p  is  lower  semicontinuous. 

Although  no  amount  of  differentiability  assumptions  on  fn,f. . f  will 

u  l  m 

imply  differentiability  of  p,  there  are  special  situations  where  it  has  been 
known  for  some  time  that  for  a  particular  (u,v),  the  gradient  Vp(u,v)  exists. 
These  tend  to  be  situations  where  problem  (P-  -)  has  a  unique  globally  optimal 
solution  x,  and  this  x  happens  to  satisfy  second-order  optimality  conditions 
of  the  strongest  kind.  The  interesting  thing  is  that  in  such  situations 

(15)  Vp(u,v)  =  (y,2)  with  z  =  VvH(v,x,y), 

where  y  is  the  unique  Lagrange  multiplier  vector  associated  with  x,  and 

(16)  fc(v,x,y)  *  f q (v , x)  +  yif1(v,x). 

The  reason  this  is  so  important  is  that  it  indicates  a  fundamental  connection 

between  the  dual  variables  that  occur  in  optimality  conditions  for  problem 

(P-  -)  and  the  possible  rates  of  change  of  the  function  p  at  (u,v). 
u ,  v 

Rates  of  change  of  p  are  called  marginal  values.  They  are  significant  in 
the  economic  analysis  of  optimisation  models  where  the  components  of  u  and  v 
represent  production  coefficients,  costs  and  resource  availabilities  that  may  be 
subject  to  fluctuation.  They  also  have  a  role  in  determining  the  stability  of 
computational  procedures  which  could  be  at  the  mercy  of  errors  in  the  specifica¬ 
tion  of  u  and  v.  Furthermore,  the  ability  to  calculate,  or  at  least  estimate, 
such  rates  of  change  is  valuable  in  decomposition  techniques* 

For  example,  the  real  problem  to  be  solved  may  be  one  in  which  only  u  is 


given: 


(Qy)  minimize  fQ(v,x)  over  all  (v,x)EE 

£  0  for  i*l, . . ♦ ,s, 

satisfying  f ^ (v,x)+u^ 

*  0  for  i*s+l, . . . ,m. 

For  each  v,  the  corresponding  subproblem  of  minimizing  in  x  can  be  identi¬ 
fied  with  problem  (P  )  in  the  case  of  D(v)  *  {x|(v,x)€E}.  The  master 

u  ,v 

problem  then  consists  of  minimizing  p(u,v)  with  respect  to  v  for  fixed  u. 

A  decomposition  of  this  sort  may  be  very  attractive  in  cases  where  (P^  is 

particularly  easy  to  solve  for  each  (u,v)  (a  well  known  technique  due  to 

Bender).  However,  it  does  necessitate  the  minimization  of  a  nonsmooth  function 

p.  Obviously,  any  information  about  directional  rates  of  change  of  p  is 

crucial  to  the  success  of  such  an  approach. 

In  certain  situations  in  convex  programming,  it  ias  been  known  that  formula 

(15)  could  be  stated  in  a  more  general  way  in  terms  of  subgradients  rather  than 

gradients,  such  subgradients  being  a  way  of  describing  one-sided  directional 

derivatives.  The  challenge  of  the  work  under  the  present  grant  was  to  extend 

this  somehow  to  nonconvex  programming.  Since  one-sided  derivatives  of  p  in 

the  ordinary  sense  do  not  necessarily  exist,  even  under  smoothness  assumptions 

on  the  functions  f^  and  set  D,  basic  theoretical  developments  were  needed. 

These  have  been  described  in  the  preceding  section. 

Article  [14]  provided  a  key  by  giving  an  exact  formula  for  the  subgradient 

set  3p(u,v)  in  Clarke's  sense  in  terms  of  extended  limits  of  Lagrange  multi- 
k  k 

plier  vectors  y  associated  with  optimal  solutions  x  to  neighboring  problems 
k  k  -  _ 

(P  k  jt)»  as  (u  ,v  )  (u,v).  In  fact,  the  multiplier  vectors  in  question 

u  ,v 

satisfy  the  saddle  point  condition  for  the  augmented  Lagrangian  for  (P  ^  • 

u  ,v 

Thus  the  augmented  Lagrangian  described  in  the  first  section  of  this  report  was 
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shown  to  have  theoretical  powers  much  beyond  what  might  be  expected  simply  from 
its  role  in  computations. 

The  next  paper,  [15],  developed  this  formula  into  estimates  for  3p(u,v) 

not  just  in  terms  of  limits  of  multipliers  for  neighboring  problems,  but  certain 

multiplier  vectors  associated  with  optimality  conditions  for  (P-  -)  itself. 

Actually,  this  was  a  two-way  process:  the  mathematical  machinery  that  had 

been  devised  was  sensitive  enough  to  allow  optimality  conditions  to  be  stated 

for  solutions  x  to  (P-  -),  even  when  the  functions  f,  are  not  smooth  and 

u,v  i 

the  multifunction  D  is  merely  of  closed  graph.  These  conditions  were  shown 
to  be  necessary  on  the  basis  of  differential  properties  of  p,  a  new  technique 
in  nonconvex  optimization  that  sheds  much  light  on  the  subject  of  "constraint 
qualifications".  In  particular,  multiplier  rules  of  Clarke  [m]  and  Hiriart- 
Urruty  [o]  were  sharpened  in  this  way. 

Many  consequences  will  be  obtained  from  the  results  in  [15],  due  to  their 
depth  and  far-reaching  generality.  This  work  is  the  culmination  of  much  effort. 

Article  [17]  deals  with  certain  more  abstract  versions  of  the  formulas 
in  [15],  true  in  part  for  infinite-dimensional  problems.  (The  framework  in  [15] 
is  intrinsically  finite-dimensional.) 

An  application  to  second-order  conditions  is  carried  out  in  [19] •  The 
formulas  in  [15]  are  refined  in  terms  of  second  derivative  information,  and 
in  this  way  new  results  on  necessary  conditions  for  optimality  are  again 


obtained. 


T 


f 


F.  Genericity  of  Optimality  Conditions 

Are  the  standard  optimality  conditions  in  nonlinear  programing  "usually" 
satisfied?  This  is  the  question  tackled  in  article  [9]*  The  question  is  signifi¬ 
cant  because  it  is  not  possible ,  as  a  practical  matter  in  most  applications,  to 
check  whether  a  given  nonlinear  programming  problem  (1)  satisfies  the  constraint 
qualifications  and  strengthened  forms  of  the  second-order  optimality  conditions 
on  which  the  analysis  of  many  algorithms,  etc.,  depends.  One  often  hears  the 
argument  that  it  is  all  right  to  base  results  on  the  assumption  of  such  conditions, 
because  they  hold  in  "typical"  problems.  But  what  does  that  assertion  really  mean? 

One  approach  is  to  consider  families  of  problems  that  depend  on  parameters, 

like  (P^  v)  in  the  preceding  section.  These  parameters  can  be  imagined  as  random 

variables  with  known  distributions.  The  question  can  then  be  phrased  as  follows: 

consider  the  set  of  all  pairs  (u,v)  such  that  (P  )  has  a  locally  optimal 

u,v 

solution  which  fails  to  satisfy  certain  conditions,  and  ask  whether  this  set 
represents  an  event  of  probability  zero.  Now  as  long  as  the  distributions  are 
continuous,  this  can  be  subsumed  by  a  much  simpler  question  that  doesn' t  involve  know¬ 
ledge  of  the  particular  statistical  distributions  of  the  parameters,  namely,  whether 
the  exceptional  set  of  pairs  (u,v)  is  negligible  (i.e.  of  measure  zero  in  the 
Lebesgue  sense) . 

An  affirmative  answer  to  this  question  was  given  in  [9]  and  (10]  for  a  funda¬ 
mental  class  of  parameterizations  of  nonlinear  programming  problems.  This  was 
complemented  by  results  in  (14]  on  the  genericity  of  uniqueness  of  optimal  solutions. 

J.  E.  Spingarn  in  his  Ph.D.  thesis  (21]  developed  a  more  complete  theory.  It 
was  necessary  to  consider  other  classes  of  parameterizations  in  order  to  have  a 
practical  result,  but  it  was  not  clear  until  his  work,  how  to  identify  the  ones 
with  the  desired  property  that  "almost  all"  problems  in  the  parameterized  family 


were  well-behaved.  He  used  differential  topology  and  other  mathematical  tools 
to  show  how  certain  kinds  of  constraint  structures  could  be  kept  fixed  (unpara¬ 
meterized)  without  unbalancing  the  abundance  of  "good"  problems  over  the  "bad" 
in  a  given  family.  These  results  have  been  published  by  Spingarn  in  [22]  and  [23]. 

Besides  serving  to  justify  certain  approaches  to  computation  it  is  expected 
that  these  ideas  will  have  a  role  to  play  in  multistage  stochastic  programming 
(see  B  above).  In  that  subject  one  has  to  treat,  as  a  matter  of  course,  nonlinear 
programming  subproblems  which  depend  on  random  variables,  and  whose  optimal  solu¬ 
tions  therefore  are  random  variables  too.  It  would  be  impossible  to  get  very  far 
without  theoretical  assurance  that  such  optimal  solution  random  variables  can  be 
analyzed  in  terms  of  nice  kinds  of  multiplier  conditions  almost  surely. 


G.  Optimal  Control  of  Dynamical  Systems. 

Three  publications  under  the  present  grant  come  under  this  heading,  [4],  [5] 
and  [21].  In  [4]  the  subject  of  duality  in  problems  of  optimal  control  is  sur¬ 
veyed,  and  also  a  number  of  recent  developments  concerning  the  existence  of  optimal 
arcs  and  the  conditions  which  characterize  them.  This  exposition  provides  a  good 

introduction  to  the  general  approach  to  optimal  control  that  can  be  made  in  terms 

* 

of  extended-rcal-valued  hamiltonians  and  subdifferential  calculus. 

Paper  [5]  describes  in  terms  of  models  of  optimal  economic  growth  a  number  of 
results  and  open  questions  concerning  control  problems  over  an  infinite  time 
interval.  The  main  question  in  such  problems  is  what  kind  of  behavior  is  naturally 
optimal  in  a  "self-sustaining"  sense,  i.e.  in  a  steady-state  manner  that  could  be 
prolonged  indefinitely.  The  concepts  that  arise  in  this  connection  are  interesting 
for  several  basic  reasons  especially  as  a  description  of  limiting  behavior  in 
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various  situations,  even  though  real  problems  never  involve  infinite  time. 

State  constraints  are  the  subject  of  the  most  recent  article  [21] •  It  has 
been  shows  that  such  constraints  in  an  optimal  control  problem  can  cause  jumps 
(discontinuities)  in  the  adjoint  variables.  Conversely,  the  possibility  of  jumps 
in  the  primary  variables  can  be  linked  to  inherent  state  constraints  on  the  adjoint 
variables.  This  is  what  is  proved  in  [21]  through  detailed  analysis  of  a  particu¬ 
lar  class  of  models  of  interest  in  economics. 
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